MGS3701 Data Mining, Spring 2025
Wed, 26 February 2025
In this chapter, we give an overview of the steps involved in data mining, starting from a clear goal definition and ending with model deployment. The general steps are shown schematically(Shmueli, Bruce, Yahav, Patel, & Lichtendahl Jr, 2017).
Collaborative filtering
Collaborative filteringa method that uses individual users’ preferences and tastes given their historic purchase, rating, browsing, or any other measurable behavior indicative of preference, as well as other users’ history.
Association rules vs Collaborative filtering
In contrast to association rules that generate rules general to an entire population, collaborative filtering generates “what goes with what” at the individual user level. Hence, collaborative filtering is used in many recommendation systems that aim to deliver personalized recommendations to users with a wide range of preferences.
Visualization can be greatly enhanced by adding features such as color and interactive navigation.
Linear Regression as Supervised Machine Learning Algorithm
Note
For example, unsupervised clustering methods are used to separate loan applicants into several risk-level groups. Then, supervised algorithms are applied separately to each risk-level group for predicting propensity of loan default.
The foregoing steps encompass the steps in SEMMA, a methodology developed by the software company SAS:
IBM SPSS Modeler (previously SPSS-Clementine) has a similar methodology, termed CRISP-DM (CRoss-Industry Standard Process for Data Mining). All these frameworks include the same main steps involved in predictive modeling.
KDD Model: Knowledge Discovery in Databases (KDD) is a systematic process that seeks to identify valid, novel, potentially useful, and ultimately understandable patterns from large amounts of data. In simpler terms, it’s about transforming raw data into valuable knowledge.
CRISP-DM: CRISP-DM stands for Cross-Industry Standard Process for Data Mining. It is a cyclical process that provides a structured approach to planning, organizing, and implementing a data mining project. The process consists of six major phases: Business Understanding, Data Understanding, Data, Preparation, Modeling, Evaluation, Deployment
© 2025 Chad (Chungil Chae). All rights reserved.